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C^ ' Abstract 

We characterise the probabihty distributions that arise from quantum circuits ah of 

whose gates commute, and show when these distributions can be classicahy simulated 

efficiently. We consider also marginal distributions and the computation of correlation 

coefficients, and draw connections between the simulation of stabiliser circuits and the 

r ) ', combinatorics of representable matroids, as developed in the 1990s. 
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i/^ '■ 1 Introduction 

O 

o 

It is widely held that quantum computation is a useful paradigm for algorithmics because 
it enables us to describe a wider class of algorithms than can be described classically, 
and some of these may represent qualitatively faster solutions to certain problems than 
could ever be obtained classically, in principle. To put this sentiment on a more rigourous 
5^ I footing, one often sees the conjecture phrased in terms of classes of decision languages 

circumscribed by asymptotic time constraints, e.g. Conjecture : BQP ^ BPP. A more 
general approach looks not at decision languages arising from uniform families of circuits, 
but at uniform families of probability distributions arising from uniform families of cir- 
cuits. Then instead of asking "Can this decision language be computed efficiently?" one 
asks more general questions such as "Can this probability distribution be sampled from 
efficiently? Can this probability distribution be approximately sampled from efficiently? 
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Can we efficiently compute the correlation coefficients of this distribution? Given samples 
from the distribution, can we efficiently verify the hypothesis that they were thus sam- 
pled?" and so on. The reader is directed to [4] for more complexity-theoretic background 
and analysis on questions of this type. 

In this paper, we restrict attention to the combinatorial analysis of a particular kind of 
uniform family of quantum circuits called IQP, first introduced in [12] and discussed 
at greater length in [11]. We call such circuits X-programs (see §3) because they are 
described by Hamiltonians composed entirely of Pauli X operators. From the perspective 
of implementation, these circuits can be rendered by any model for computing in the class 
QNCy (defined in [7]). In this paper (§3), we fully describe the probability distributions 
that they generate, and also give formulas for the correlation coefficients associated to these 
distributions. We show when these probabilities and correlations are computable efficiently 
classically, and when they are not. We recall the argument [11, 4] for why it is generally 
impossible to sample efficiently classically from these distributions, unless the polynomial 
hierarchy collapses. This is by no means the first time a fully combinatoric treatment has 
been made of physical problems {e.g. [15]). 

All of these results depend on describing X-programs in terms of binary matroids; equiv- 
alently binary linear codes. We begin by recalling the relationship between matroids and 
codes, and we note a link between the efficient simulation of Clifford circuits [2] and Ver- 
tigan's algorithm for binary matroids [13]. Our aim is to provide a fairly self-contained 
reference, so all of the combinatorial definitions needed will be given up front in §2, where 
(almost) no mention will be made of quantum computation. 



2 Combinatorics 

This section introduces the necessary combinatorics notation and ideas. We give three 
propositions, together with sketch proofs. To proceed to the results of §3, the uninitiated 
reader will want to understand the statements of these propositions, though perhaps not 
necessarily learn the methods of their proofs. The key subsection here is §2.3, where the 
main computational complexity results are recalled. In §2.4 we give worked examples of 
the transformations that are used later. 

Let P be any binary matrix, that is, any matrix over F2 having, say, n rows and / columns 
and rank r. To it we associate a binary linear code C = C{P) and a binary matroid 
Ai = A4{P). The code C will be generated by the columns of P, and the matroid 7V4 will be 
coordinatized by the rows of P. Such codes and matroids are in one-to-one correspondence 
(Proposition 1 below), so everything that can be said about the one can be said about the 
other : accordingly we will try to say everything twice, for clarity. 



2.1 Formal definitions 

Formally, C is defined to be the subset of the vector space F2 that is generated by linearly 
combining columns of P. The cardinality of the code is thus 2^, so we call r its rank. The 
elements of C are called codewords, and these may also be thought of as binary strings 
of length n. The parameter n is called the length of the code. The parameter / is not 
well-defined for the code, only for its generator matrix P, though clearly r < I. 

Formally, M is defined to be the isomorphism class of the (multi)set E of rows of P (vectors 
in the vector space F2) together with the induced rank function pj^ that maps subsets 
of these row- vectors to their rank, i.e. to whichever integer counts the dimension of the 
subspace of F2 that they span. In particular, pm{^) = and pm{E) = r, by definition. 
The parameter n is called the size of the matroid, and r is called its rank. The parameter / is 
not well-defined for the isomorphism class, since the row-vectors could easily be embedded 
in a larger vector space, though clearly r < I. 

Proposition 1 For binary matrices P and Q, the following are equivalent : 

• P and Q generate the same code, C{P) = C{Q); 

• P and Q generate the same matroid, j\4{P) = M{Q); 

• there exist binary matrices R and R' such that P = Q ■ R and P ■ R' = Q. 

To prove this, we give a normal form for a matrix P. First, identify a basis for the matroid 
M{P), that is, find r rows that collectively have rank r. Without loss of generality, i.e. 
for illustration, we assume these to be the first r rows of P. Then we can write 

^ - ic) - (^-. 

where B is the submatrix given by the first r rows, and C is the remaining submatrix. 
Here I is an r-by-r identity matrix. We can find D uniquely satisfying D ■ B = C precisely 
because i? is a basis for the space to which every row of P belongs. This kind of reduction 
can be achieved by Gaussian elimination, and P' is called an echelon reduction of P. 
Echelon reduction is unique once an ordered basis is chosen, but we are free to choose 
any basis with any ordering. Elsewhere in the paper we use the notation P' to denote an 
echelon reduction of P. 

The reader may complete the proof of the Proposition, using this idea of identifying a basis 
set and an echelon reduction. ■ 



2.2 Weight enumerator and Tutte polynomial 



The weight enumerator polynomial of C is a monovariate polynomial (i.e. element of Z[C] 
for indeterminate C) given by 
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where |c| denotes the Hamming weight of the codeword c. 

The Tutte polynomial of A4 is a bivariate polynomial {i.e. element of Z[x,y] for indeter- 
minates x, y) given by 

TM{x,y) := X] (^ - lY'^^^'^-P^'^^^ ■ {y - 1)\^\-pm{X)^ 

XCE 

where pM : E ^ 'E is the rank function defining the matroid. 
Here is a random example, with n = 6, r = 3, 1 = 4 : 
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y{x + y){x^ + x + y). 



The natural connection between these two concepts is given by Greene's theorem : 
Proposition 2 (C. Greene, 1976) 
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where n 
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\E\ and r = p_m{E) and E is the base set for Ai, as usual. 



This can be proved inductively, using the entirely standard matroid notions of deletion and 
contraction, and taking for the base cases those few examples where r < n < 1. Note that 
loops and coloops are conveniently dealt with first as special cases. ■ 

Throughout this paper, we find it convenient to use a normalised version of the weight 
enumerator /Tutte polynomial, given in terms of a single real parameter 9. We fix the 



following notation for use throughout the rest of the paper : 

_ je(r-n) -r -^r n rp / '^ ^ ^ ^2i6» 

- e' > -i -sm d-lM{P) y gie_g-»e ' ^ 

again with r for the rank and n for the length (size) of the code (niatroid). Note that 9 
is really only defined modulo 27r, and inverting the sign of 9 only conjugates the output a 
value trivially. 

Here is a useful observation, showing how normalisation ensures |a(pg\| < 1 : 

Proposition 3 The scalar ctmQ-^ defined above can be expressed directly in terms of the 
code C{P) according to the probabilistic formula below. 

a(P,0) = ^ceC{P) [ exp{ie ■ {n - 2|c|)) ] . 

To see this, observe that the factor e^^^ can be pulled out in front of the expectation 
operator, and what is left constitutes a probabilistic interpretation of the required value 
2"'' • VFc(e~^*^)i because there are 2*" codewords in the code. ■ 



2.3 Computational Complexity 

What is the computational complexity of computing the value of the scalar amg\? This 
question is addressed in [8] and [13]. The multiples of j are found all to be essentially 
trivial. 

For example, if 9 = tt then C(mg\ = (—1)". If ^ = ^ then a(p,6») vanishes whenever C is not 
an even code, and evaluates to i" whenever it is even. One can determine whether a code 
is even by looking at any generator matrix for it : the code is not even iff any column of 
P has odd Hamming weight. If 9 = j then Vertigan's algorithm [13] provides an efficient 
explicit polynomial-time recursion to evaluate T_M{—i,i), which is proportional to amny 
This algorithm is remarkably similar to the algorithm used in classically simulating the 
probability distribution associated to a Gottesman-Knill-Clifford computation [2]. 

Conversely, for any other values of 9, the worst-case complexity for computing ^(p^e) (over 
the class of all binary matrices P) is ^^P-hard, and there are efficient reductions from one 
9 to any other (excluding multiples of j, limiting to algebraic values of e ) [8, 13]. In 
particular, the reader should note that = 5 is no less hard than any other value of 0, 
when it comes to evaluating a(p,6») in the worst case. 

The same hardness holds for specific subclasses of matroid, e.g. for graphic matroids [8]. 
But it should come as no surprise that there are 'trivial' classes of matroid where a(p,6») 



is readily evaluated for all 9. The primary example would seem to be the case of graphic 
matroids having bounded treewidth [3]. Jaeger et al. also claim that Tji^{x,y) is easily 
evaluated when {x — l){y — 1) = 2 (as is always the case for our af^pg^) whenever ^A is 
the cycle matroid of a planar graph (c/. [8] §(5.8), also [14] lemma 12.2, and for the main 
reduction, see [6]). 

2.4 Transformations 

There are two matroid/code transformations that we shall be requiring in §3 for our first 
two theorems. Notation for these is given here, but is not generally found in the literature. 
We include example computations of both transformations, to add clarity. 

Projection 

Let X be a non-zero string of I bits, and from the matrix P {resp. the code C, the matroid 
M) we derive PTx (resp. CTx, TWTx) by projecting each row along direction x onto 
a hyperplane avoiding x. This projection operation is well-defined for both C and M, 
regardless of which hyperplane is chosen. 

To make it well-defined for P as well, we specify (somewhat arbitrarily) that this projection 
is achieved by mapping a row a to whichever of {a, a + x} is lexicographically first. Note 
that the rank of CTx (and of 7V4Tx) will either be one less than that of C (and M), or 
else CTx = C (and TWTx = Ai). In either case, CTx C C and the length is not changed. 
Note that if x was originally a row of P, then PTx contains an all-zero row, so the matroid 
A^Tx will contain a loop. 

Here is a random example, with n = 6, r = 3, / = 4, and x = (0110) : 
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/ 1 1 1 \ 
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10 1 
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V 1 1 / 
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In this example, the code/matroid is indeed changed and so the rank drops by one from 
3 to 2. One more loop is created (second row) and one more non-trivial parallel class is 
created (first and fifth rows now parallel) . Echelon-reduced forms of these example matrices 



are 
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(PTx)' = (P'Tx')' 
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where x' = (010). 



AfRnification 

Let s be a string of / bits, and from the matrix P {resp. the code C, the matroid Ai) we 
derive Pg {resp. Cg, Ms) by deleting all rows that are orthogonal to s, using the natural 
F2 inner-product. Note that this increases neither size (length) nor rank. Note that Cg 
always contains the all-ones codeword, and (equivalently) A^g is always an affine matroid, 
and indeed a minor of A4. 

Here is a random example, with n = 6, r = 3, / = 4, and s = (0110) : 
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/ 1 1 1 \ 
110 

10 1 
10 11 

V 1 1 / 



Ps 



/I 1 1\ 
10 1 
10 11 

V 1 1 / 



In this example, the rank did not drop, but the length (size) of the code (matroid) dropped 
from 6 to 4. Echelon-reduced forms of these example matrices are 
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where s' = (101). Note that x and s transform differently, to x' and s' respectively, under 
echelon-reduction. This is because s really belongs to the dual space, being the space of 
linear maps F2 — )• F2 . (It is convenient to think of x as a row- vector and s as a column- 
vector.) 



3 X-programs for IQP 

The class QNC*^ was introduced by H0yer and Spalek [7] to capture the idea of allowing 
only a constant amount of scope for temporal complexity within a (uniform) family of 
circuits. IQP goes 'a step further' by allowing essentially no temporal structure within 
the abstract quantum process. The main simulation results for IQP distributions (as given 
by X-programs) and their marginals are given in §3.5 and §3.6, respectively. 



3.1 Definition 

The definition given below is a little different from that given in [4], to enable the combi- 
natorial structure to be seen more clearly. 

An X-program (P, 9) gives a recipe for building a probability distribution. Here P denotes 
an n-hy-l matrix over F2 and is some real angle. The matrix P is used to build a 
Hamiltonian of n commuting terms on / qubits, each term a product of Pauli X operators. 
(Formally we demand that I = 0{poly{n)), though it turns out there is no point even 
taking / > n.) 

n I 

Pab 



Hp - En^ 



a=l 6=1 

Thus the columns of P correspond to qubits, while the rows of P correspond to 'gates' (in 
a circuit) or 'interactions' (in a Hamiltonian) on the qubits. 

The distribution is then given by 

F[X = x] := (x|exp(i(9Hp)|0; 

using the squares of the magnitudes of the transition amplitudes, as required by the Born 
rule for quantum processes. We call this an IQP probability distribution. 

3.2 Implementation 

We consider that a quantum circuit for sampling from this distribution contains no 'inher- 
ent temporal structure' because all terms in the Hamiltonian commute, and so it makes 
no difference (in principle) which is implemented first, or whether they are implemented 
simultaneously, somehow. 

In [12, 11] we showed explicitly how to implement these circuits in certain theoretical 
architectures. For example, one such implementation uses so-called graph states, where P 



is interpreted as (the biadjacency matrix of) a bipartite graph at whose vertices are located 
single qubits. (A particularly interesting mild generalisation of X-programs — still without 
introducing temporal structure — is to consider the preparation and arbitrary measurement 
of graph states, with no intervening feed-forward/adaptive control of measurement results.) 
But these implementation issues will not be of concern to us in the present analysis. For 
this paper, the Hamiltonian Up 'takes priority' over any circuit that might be designed to 
implement it, or the unitary mappings it generates. 

3.3 Correlation coefficients 

As well as being interested in IQP probability distributions and the individual probabilities 
F[X = x], we are also interested in the correlation coefficients. These are given by taking 
the F2 Fourier transform of the probability vector. In symbols, we define 

/3s := 2 -Fix -8 = 0]-!, 



and thence derive the following formula : 






F[X-s = 0] = 


l + /3s 
2 ' 




F[X = x] = 


2-' . ^ (-1)-^ 

sGF5, 


•A 


= 


Es[(-l)"-^-/3s]. 





The second line above shows that the probability distribution is entirely specified by the 
values /3s, and therefore encodes 2^ — 1 degrees of freedom (the value /3o is fixed as 1 for all 
distributions). Because a distribution encodes so much data, it can be a very cumbersome 
object to work with. 

3.4 Relation to matroids 

Here we give the theorems that show the link between probability distributions for X- 
programs and the weight enumerator. 

Theorem 1 For any binary matrix P and any angle 9, 

(0|exp(imp)|0) = a^^p^gy 
Moreover, for x 7^ 0, 

(x|exp(z6'Hp)|0) = a(pTx,0) -a(p,e), 
where PTx denotes the projection defined in ^2.4, and a is the function defined in ^2.2. 

9 



This theorem has not been pubhshed before. We prove it in §4. ■ 

There is also a simple formula for the correlation coefficients /5s, as follows. 

Theorem 2 Let s be a string of I bits and let (P, 6) be an X-program on I qubits. Let (3s 
be the correlation coefficient associated to s for the associated IQP probability distribution. 
This /3s is a real value, and can be specified in terms of the affinification Ps; o-s follows. 

Ps = «(Ps,26»)- 
We prove this theorem in [12, 11], and give a slightly simpler proof in §4, for completeness. 



3.5 Implications for simulation 

In this section, we summarise a variety of observations about IQP probability distributions, 
for different values of 9 and different classes of P. (Later in §3.6, we consider 'breaking 
apart' IQP distributions into marginal probability distributions.) 

6 = J (completely solved) 

The case ^ = ^ is interesting. In that case, all values F[X = x] and F[X • s = 0] can 
be computed efficiently using Vertigan's algorithm, together with the theorems above. 
Moreover, the actual probability distribution can be efficiently sampled classically, exactly. 
This is because the unitary map exp(i^Hp) can be decomposed as a product of Clifford 
gates, whence the Gottesman-Knill theorem applies. To see this, observe for example 
that 





zi-exp -^n^^- = -^-^i-n^^' 



and so the Pauli group is fixed by this unitary. There is thus a link between the classical 
simulation of so-called stabiliser states [2], the fact that Clifford circuits are effectively 
devoid of temporal structure [5], and Vertigan's algorithm for evaluating Ta4(— i,i) effi- 
ciently [13]. 

The following is an analogue of the Gottesman-Knill Theorem for X-programs, giving a 
complete description of the probability distribution in this case : 

Theorem 3 For any X-program on I qubits, with 9 = j, there is an efficiently computed 
affine space S of F2 over which the associated probability distribution is supported and 
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uniform. That is to say, for some easily- determined S, 

F[X = x] = 2-'^^""(^) • {x e 5}. 

We prove this theorem in §4. ■ 

= ■§ (main area of ongoing research) 

Another interesting case — this time not so easy to simulate — occurs when 9 = ^. A 
quantum circuit could efficiently sample from the distribution, of course, but now a classical 
computer could efficiently determine any (polynomial number of) correlation coefficients 
of our choosing for that distribution, because /3s = am^n) is efficiently computable by 
Vertigan's algorithm. Therefore, a classical computer would be perfectly able to run an 
effective hypothesis test against a purported (sufficiently large) sample generated by the 
quantum circuit, regardless of the matrix P chosen. Such an hypothesis test could be 
used to distinguish this 'null' hypothesis from some appropriate 'alternative' hypothesis. 
This testing paradigm generalises our earlier work [12] and we anticipate it being useful in 
validating quantum computing implementations. We hope to explore the ramifications of 
this observation more fully in future work. 

The question then arises as to whether a classical computer could sample from the distri- 
bution by some other efficient means, or else sample efficiently from some statistically close 
distribution. We know that BPPpath 7^ PP (unless of course the Polynomial Hierarchy 
collapses), and so via Toda's theorem and the hardness of computing atp^n) [8], we can 
deduce that no classical algorithm will be able to sample efficiently from the IQP distri- 
bution exactly, for generic P. In fact, we can use the theory of post- selection to derive an 
independent proof (independent of [8]) of the hardness of evaluating Wc{e^^''^), much as 
Aaronson [1] used post-selection to derive an independent proof the closure properties of 
PP (see [4] for a fuller account of this). 

We leave open the possibility of an approximate classical sampling technique. It is im- 
portant to determine just how close a classical algorithm can expect to get, with distance 
between distributions being measured as usual by the 1-norm (total variational distance) : 
this is a subject we hope to address in future research. 

Arbitrary 6, but 'special' P 

For arbitrary 9, Tutte polynomials can be evaluated for those matroids all of whose con- 
nected components are O(logn) in size, or indeed for graphic matroids with O(logn) 
treewidth [3]. This suggests that somehow there is very limited 'data flow' within an 
X-program computation. 

11 



If P is the incidence matrix of a graph (i.e. two Is per row), then the formula of Proposi- 
tion 3 may be reinterpreted 

a^p^e) = Ec [ exp(i6l • {n - 2|c|)) ] 

so that n counts the number of edges of this graph, and c ranges over aU possible cuts, with 
|c| denoting the weight of a cut (a cut is simply a partition of vertices into two disjoint 
subsets, and its weight is simply the number of edges that cross the partition). Vertigan 
[14] indicates that Kasteleyn's Theorem [9] (which gives an efficient technique for counting 
perfect matchings on planar graphs) can be used to establish an explicit polytime algorithm 
for evaluating a(p,0) whenever P is the incidence matrix of a planar graph (c/. §2.3). The 
algorithm for achieving this reduction is spelled out in more detail in [6]. 



Equivalence modulo 9 

Before moving on, we give some insight as to why the values ^ and 5 are particularly 
noteworthy... 

For two different X-programs P, Q, (with the same 9 value), we say that Hp ^q Hq if they 
generate the same unitary map, exp(i^Hp) = exp(i0HQ). This notion of equivalence can be 
useful if one wishes to optimise certain properties of a matrix (perhaps for implementation 
reasons) without wishing to alter the distribution it would generate. 

Proposition 4 If 9 = c- 7r/2'^ for fixed integers c, d, then for any matrix P (with n rows 
and I columns) it is computationally efficient to find a matrix Q such that 

• Up r^g Hq; 

• # rows of Q = 0{poly{n)), # columns of Q = I; 

• each row of Q has at most d ones. 

1 V . 

To prove this, simply rewrite the Pauli operator Xj as 1 — 2xj, where Xj = — ^^ has 
integer eigenvalues. Then rewrite i^Hp as a polynomial in the XjS. In this format, the 
number of terms may increase by an exponential factor (since a single term, e.g. Xi . . . Xi, 
may expand to form 2' terms), but we can nonetheless efficiently list all those terms whose 
degree (number of Xjs) is at most d. In the expansion, the remaining terms contribute 
nothing to exp(i^Hp), because their coefficient is an integer multiple of 27ri, and so we 
may ignore them. For example 

ZTT ITT 

— ■Xi-X2-X3 = — •(1-2X1)(1-2X2)(1-2X3) 



ITT / 

Tv 



1 — 2X1 — 2X2 — 2X3 + 4xiX2 + 4xiX3 + 4X2X3 ) — 27r2XiX2X3 
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and so if 6* = f , we can replace X1X2X3 by X1X2X3 + (1 - Xi)(l - X2)(l - X3), which 
is of lower degree. 

Therefore we take iOHq to be given by the polynomial iOTip with high-degree terms 
(counting Xjs) removed. Rewriting this as a polynomial in XjS, we can recover Q with the 
desired properties. ■ 

As a consequence of this, for studying ^ = ^ we can make do with limiting to the class of 
graphic matroids, but for 9 = ^ we should also allow matrices with three Is per row. It 
is doubtful that one can infer that graphic matroids are necessarily trivial in this context, 
since hardness results are shown in [4] where the underlying matroids are all graphic (and 
^ = ?). Nonetheless, the number of genuinely different programs possible for a given n 
would seem to be smaller for 6 = j or even for 9 = ^ than for, say, ?, because of this 
effective restriction on row density. 



3.6 Simulating marginal distributions 

What happens if we attempt to simulate only a few of the output bits from an X-program, 
assuming that the remaining bits from X are traced out? For this, we need some additional 
concepts. Let ?ti be a linear idempotent map on F2, so m = ?ti^. This is called a projector, 
and we do not assume it to be orthogonal. For a (primal) vector x, if m{x) = x then we 
say that x is supported by m; equivalently we say that x is in the range of m. Let \m\ 
denote the dimension of the range of m, so / — |?7i| denotes the dimension of its kernel. Let 
K and R denote the kernel and range of m, and let K* and R* denote the kernel and range 
of its dual (m*), so that K* is the perpendicular space to R, and R* is the perpendicular 
space to K. (This is because if x = ^(x) then y • x = y • m(x.) = m*{y) ■ x, and so if also 
m*{y) = 0, then y • x = 0, i.e. x±y, and so on.) 

We say that m is supported on b (qu)bits if K contains / — b distinct vectors of Hamming 
weight 1. Note that it is possible for the number of bits b on which m is supported to be 
larger than the rank |?tt-|. Likewise, x might be supported by m even if |x| > j?7i|. But 
if the matrix representation of m were diagonal, then b = \m\ and we could think of m 
simply as masking off certain bit-locations that are to be traced out. 

Define a marginal probability distribution via the random variable m(X.), according to the 
natural construction 

F[m(X)=x] := {x € i?} • J]] F[X = x + k]. 

Proposition 5 Let X 6e a random variable for the distribution of an X-program (P, 9) and 
let m be a projector whose range is denoted R and for which the range of m* is denoted 
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R* . Restricted to R, the random variable m(X.) has probability distribution 
F[m(X)=x] = Es6fl.[(-ir^-a(p,,2,)]. 

This is seen using the following chain of reasoning, working from the definition above, 
recalling §3.3 and Theorem 2. 

F[m(X)=x] = {xGi?}. j;2-'.^(-ir^+''-./3, 

= 2~'.{^GR}-Y.{-ir^-^-Y.{-if-^ 

seF!j keA' 

= 2-l™l • {x G i?} • ^ (-!)''■" • /3s 

sG/J* 

= {xGi?}.Es6H'[(-l)""-«(P.,2e)]- 



Such marginal distributions are often used in defining decision languages from families of 
distributions arising from uniform families of computational circuits [4]. Marginal distri- 
butions are also useful when attempting to render classical sampling algorithms, because 
it is clear that if each probability of each marginal distribution is computable, then there 
are many ways in which samples from the distribution can be simulated efficiently. 

Other combinatorial results (strong simulation of marginals) 

A theorem about ^ = f that works for arbitrary P :- 

Theorem 4 For an X-program {P,0) with n terms and angle 6 = j, giving rise to output 
distribution X, for any 'masking' projector m with |?tt,| = 0(log(n)), the probability vector 
for the marginal distribution ni(X.) is polynomially long and can be explicitly computed 
(classically) efficiently, regardless of P. 

A theorem for 'sparse' P that works for arbitrary 6 :- 

Theorem 5 For an X-program (P, 9) with n terms and arbitrary angle 9, giving rise to 
output distribution X, for any 'masking' projector m with \m\ = 0(log(n)) that is actually 
supported on O(logn) of the qubits, the probability vector for the marginal distribution 
m,(X) is polynomially long and can be explicitly computed (classically) efficiently, provided 
there is some constant c (independent of the problem instance) bounding the number of Is 
in any column of P. 

A theorem for 'graphic' P that works for arbitrary 9 :- 

14 



Theorem 6 For an X-program (P, 6) with n terms and arbitrary angle 6, giving rise to 
output distribution X, for any 'masking' projector m with \m\ = 2 that is actually supported 
on at most 2 of the qubits, the probability vector for the marginal distribution m(X) can 
be explicitly computed (classically) efficiently, provided each row of P contains at most two 
Is. 

Elementary proofs are given in §4. ■ 



Non-combinatorial results (weak simulation of marginals) 

Moreover, without considering Tutte polynomials at all, when |?ti| = 1, and say R = {0, m}, 
R* = {0,m*}, we can always sample classically from the distribution ?ti(X) over 1-bit 
strings — i.e. emulate single output qubits independently — because of the formula 

F[m(X)=0] = F[X-m*=0] 

2 

1 + Eegc(P^.) [ exp(2ig ■ {n^, - 2|c|)) ] 
2 
= Eeec(P^.)[ cos2(0.(n^*-2|c|))], 
F[m(X) = m] = Eeec(P^.)[ «i^'( ^ ' ("-* " 2|c|) ) ], 

where nm* denotes the length of C{Pm*) as usual. Note the use of Proposition 3 in this 
deduction, and the fact that aip , 26») is real. We can render this sample classically effi- 
ciently by sampling C{Pm*) uniformly, even though we need have no idea how to compute 
the value a(p^, ,26»)- 

This result can be generalised to the following theorem : 

Theorem 7 For an X-program (n terms by I qubits) with any 9, if m is a projector on F2 
inducing a marginal probability distribution, with \m\ = 0{log{n)) there is then an efficient 
(i.e. 0{poly{n)) time) classical algorithm for sampling from this marginal distribution, 
accurate to exponential precision. 

This is proved in [4] without direct reference to combinatorics. We also provide a proof 
below in §4 without direct reference to quantum computing. ■ 



4 Appendix of Proofs 

The proofs of Theorems 1, 2, 3, 4, 5, 6, 7 are given in this section. 
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Proof of Theorem 1 

This Theorem directly links probabilities for X-programs to Tutte polynomials. 
First we make a change of basis, writing 

(0|exp(i6'Hp)|0) = ( + |exp[i^^fJzf'"M | + ). 

V a=l 6=1 / 

Because this contains a matrix that is patently diagonal, and because its vectors ( + | 
and I + ) represent uniform superpositions (with matching phase), we can assert that this 
expression evaluates to 



2"' • Tr 



(n I N 

a=l 6=1 / 



2"'. J^exp i0 5^n(-l) 



Pn.h-Sh 



sGFl 



J^sGF' 



\ a=l 6=1 
/ n ^ 

expU^^(-l)(-P-^)- 

V a=l / 

/ n \ 



E, 



cgC(P) 



exp ^^^(-1)'='' 
2-" ■ Y^ exp('i^(n-2|c|)) 



ceC(P) 
= "(P,0)- 

To evaluate the other transition amplitudes, we need to observe that 

(x I = (0 1 exp I- 



^|(-..ri^r)) 



The case ^ = ^ is most easily dealt with, where t G Z"*". Then let Q{j) denote a matrix 
obtained from P by appending j extra rows identical to x, and we quickly see that 



(x|exp(i0^]Jxf'"M |0) = e-*^*-(0 



/ n+t I 

lexphej^n^. 



Q(t)a 



a=l 6=1 



a=16=l 



-I ■ a 



{Q{t),e)- 



Now using the fact that ttrQrj),^) is proportional to the evaluation of some Tutte polynomial, 
we can use the deletion and contraction formulae to remove the extra rows that were added, 
according to the following chain of reasoning. Let R{j) denote the matrix PTx with j extra 
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loops (all-zero rows) appended. This corresponds to the matroid obtained by contracting 
one of the appended rows from Q{j + 1). 

There are two sub-cases to consider. In the first, the size and rank of A4{Q{t)) are n + 1 
and r respectively, and x lies within the span of E, and C(PTx) is strictly contained within 
C{P). We deduce 

'^M(QU))i^^y) = TM{Q(j-l)){x,y) + TM[R{j-l)){x,y), 

TM{R{j)){x,y) = y^ ■TM{PT^){x,y), 

and hence 

1 - y* 



Substituting this into our formula for a, and setting x = ^iii_^iD and y = e , we ob- 



SLlLUtlllg, UlllO lliUU ULLl lUllilUia lUl LX , CXliU. OCtUllig O. »twi ^ 

tain 

^2iet 



{Q{t),e) = -i ■ e^^^'-""^) ■ f ■ sin^ ^' [ i_ ^2iB ' '^M(ptk){x, y) + Tm(p){x, y) j 



-I ■ a 

iO{r-n) . -r . „• r - ' 2 



-e ' ' ■ I ■ sm 



1 - e2»' 
«(PTx,0) - «(p,e)- 



7a^(PTx)(2;,?/) + T^{P)ix,y) 



The second sub-case has that the size and rank of A^(Q(i)) are n + t and r + 1 respectively, 
and X does not lie within the span of E, and C(PTx) = C{P). We deduce 

TM{Q{i)){x,y) = x-T_M^p){x,y), 

T^M{QU)){x,y) = TM{Q{j-i)){x,y) + T_M{R{j-i)){x,y), 

T^M{R{j))ix,y) = y^ ■T_M(PTx){x,y), 



Ta 



and hence 



TM{Q{t)){x,y) = ( Y~;7 + ^-i) ■^>!(^)(^'y)- 

On substituting in x = "^^g^^.^g and y = e^*^, we see that the expression vanishes. It 
therefore still holds (now somewhat vacuously) that 

-^ • "(Q(t),6») = «{PTx,6») - Oi{P,e)- 
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With both sub-cases estabhshed, we argue that since this equation holds whenever ^ = ^ 
for t G Z"^, so it must hold for all 6. This is true because the expressions can be thought 
of as Laurent polynomials in 2; = e — 1 (i.e. elements of C[z,z~^], because y = z + 1 
and X = 1 + 2z~^), and if two Laurent polynomials agree at an infinite number of distinct 
places, then they must agree everywhere. ■ 

Proof of Theorem 2 



The proof of this Theorem first appears in [12], though the present version is a little 
simpler. 

We start with the basic definition of F [X = x] , and change into the diagonal basis in order 
to make the summation and lose the bra-ket notation. Then we work on removing the 
modulus signs. This gives 



F(X = x) 



;x|exp( i^^fjxf- )|0) 

V a=l b=l / 



n I 



V a=l b=l 



a6FJj 



beFJ, 



2"' Y^ (-If '^exp I i^^(-l)^- 



aGF' 



a=l 



E, 



(-ir^exp ^^^^(-i; 



Pa a 



a=l 



E 



a,d 



(-ir'^exp i^^(-l)^«'^(l - (-l)-P-^ 



a=l 



Next, we take the definition for /?s in terms of the Fourier transform, drop in the formula 
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above, and then just kill off the spare variables. 

/3. = j;(-ir^.p(x=x) 



xeFi 



j;(-ir-.E.,d 



xSF' 



-If'^exp i0^(-l)-f'-^(l-(-l)^" 



a=l 



E: 



a,d 



2^ • {s = d} • exp ie ^(-l)^''''(l - (-1)^"'^ 



a=l 



E« 



exp I z0^(-l)-P-'^(l-(-l)-P» 



a=l 



Now it is clear that whenever P^-s = 0, then the corresponding term vanishes, independent 
of the variable a. Thus we need only include in the sum those terms for which P^ • s = 
1. 



/3s 



l^aeF' 



exp 2i9 Y, (-l)-^"' 



Pa-S=l 



This formula we can naturally express in terms of the code C{Ps). Let Ug be the size of 
the affinified matroid (equivalently, the length of the resulting code) . Use Proposition 3 to 
finish the deduction. 

/5s = E,gc(ft)[ exp(2i0(ns-2|c|) ) ] 

= «(Ps,26»)- _ 



Proof of Theorem 3 

This Theorem and its proof are essentially a reworking of Vertigan's algorithm for Tj^(—i, i) 
on binary matroids (because of Theorem 1), but here expressed more directly in terms of X- 
programs and their distributions. (Note that Vertigan's algorithm [13] effectively computes 
Oi(p^iL\ for input P, whereas the algorithm presented below discovers probabilities, and 
hence only |a(pi)|.) 

Let X be any element of F2 and consider 

F[X = x] = E,,^,[(-ir--a(p^,|)] 

= E,,^J(-l)-^.2-=.i"-T^C(ft)(-l)]. 
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The expression 2"'''' • Wctp\{—1) has some particularly simple interpretations. Evidently, 
if C{Ps) is an even code, then this expression takes the value 1; otherwise it vanishes. But 
the following conditions are all equivalent : 

• C{Ps) is an even code; 

• the column vector P • s is orthogonal to every codeword of C{P); 

• {P ■ s)-^ • P is all-zero; 

• seKer{P'^ -P). 

Accordingly, we can restrict attention to those s € Ker{P^ • P). This is simply a linear 
constraint on s, because the kernel of a linear map is a subspace of its domain. Write V 
as short-hand for Ker{P^ ■ P). Observing that n^ is given by |-P • s| (the Hamming weight 
of the column vector P • s), we see that it is indeed always even in the case s € y, as one 
might expect. 

Then define the subspace U := { s : s E 1/, v^'^\ = \^ <V , consisting of those elements 
s of y for which 4 divides rig. (Note that U really is a subspace, because it is closed under 
addition of elements, because for all s G y, P • s is orthogonal to everything of the form 
P-a.) 

Combining these ideas, 

F[X = x] : 



2-' 


.^(_l)x-s.2-s 

sSFJ, 


■e^-Wci^P 


2-' 


.^(_l)x-s.,n. 




2-' 


■ 2>;(-ir^ - 

V set/ 


sev 



Now this can be simplified further, because each of these latter sums either vanishes or 
else is equal to the number of terms it contains. Finally, we consider this expression in two 
different cases. 

First case :-\i\J = V (because 4 divides ng for every s S y) then the expression simplifies 
to 

P[X = x] = 2-'.(^(-lH 
Vsey / 

= 2^*™(^)-' • { X ± y }, 
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which means that the distribution is uniform over the subspace of vectors orthogonal to 
the kernel of P^ ■ P. We could naturally denote this 'support' set 5i := V-^. Because 

I divn{V)—l 

€ V , this case leads to |a(p^i)| = 2 2 . 

Second case :- \i U < V (because only half of the elements of V have that 4 divides rig) 
then the expression simplifies to 

F[X = x] = 2'^^"*(^)-' -({x^C/} - {x±y}) 

which means that the distribution is uniform over the affine space comprising the (unique) 
non-trivial coset of V-^ in [/-*-, equivalently denoted 5*2 := U-^\V-^. Because U-^\V-^, 
this case leads to am^iL) = 0. Considering our example P from §2.2 : I = A, V = 
((1001), (0111)), U = ('(6111)), and indeed Wc(p)(-«) = 0. 

The two cases (5*1, ^2) are distinguished by computing (bases for) V and U . It is clear that 
V is readily computed. To find U (and thus establish the proof) one need only examine a 
basis for V . For a fixed basis, if every basis element s of y has that 4 divides ng, then by 
closure it must be that U = V . Alternatively, if some 'bad' subset of the basis has elements 
s for which 4 does not divide ng, then by a simple elimination technique we can change the 
basis (adding one 'bad' element into another to make it 'good') so that exactly one basis 
element has 4 not dividing ng. With this one 'bad' element removed, the remaining set 
forms a basis for U . ■ 



Proof of Theorem 4 

Certainly when |?7i| = 0(log?i), the number of degrees of freedom associated to the distri- 
bution for ?7i(X) scales only polynomially in n. 

If also ^ = f then the entirety of this distribution becomes classically computable and 
simulable. This is achieved by running Vertigan's algorithm a polynomial number of times 
to compute the /3g values explicitly for all s G i?*, in accordance with Proposition 5 : then 
we compute the Fourier transform of that vector to find the probability vector for the 
distribution in question. ■ 

Proof of Theorems 5 and 6 

As above, \m\ = 0{\ogn) implies that we shall need to compute only polynomially many 
correlation coefficients (/3g). Moreover, the condition on m actually being supported on 
O(logn) of the qubits (for Theorem 5) means that we need compute /3g only for cases 
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where |s| = 0{logn). Recall (Theorem 2 &: Proposition 2) that Ps depends on the Tutte 
polynomial, rank, and size of M{Ps). 

For any s € F2 with |sj = 0(log n), the size (and hence rank) oi A4{Ps) evidently cannot be 
bigger than c- O(logn), where c bounds the number of Is in every column of P (this can be 
seen e.g. by bounding the number of Is there can be within the columns indicated by s, and 
each row of Pg must contain at least one of these Is). Even a 'dumb' (exponentially slow) 
algorithm for evaluating Tutte polynomials will require only time scaling polynomially in 
n^ for such tiny matroids. Hence all the required correlation coefficients, and thence the 
entire marginal probability distribution, can be computed as required for Theorem 5. 

Note that neither the bounds on the Hamming weight of rows/columns of P nor the 
conditions regarding m being supported on some number of qubits are matroid invariants : 
rather, they depend critically on the actual presentation of P and m and are not preserved 
under basis transformation. 

For Theorem 6, we have at most three correlation coefficients to compute (since the range 
of m has dimension at most two, and hence cardinality at most 4, but /5o is always 1). 
In each case, A4{Ps) is the cycle matroid of some bipartite graph one of whose partitions 
contains at most two vertices (corresponding to the qubits selected by m). It is relatively 
straightforward to come up with a polytime algorithm for evaluating the Tutte polynomial 
of such an easy graph. 

We will not bore the reader with an explicit derivation of such an algorithm. Suffice it to say 
that it is relatively straightforward to identify and group the graph minors intelligently, 
so as to make recursive evaluation of the Tutte polynomial efficient. This is because 
every minor of such a graph — having once had all its coloops deleted — is never more than 
two edge-contractions away from a 'star-shaped' graph, whose Tutte polynomial takes the 
explicit form 

,a 



y% - 1 



k 

^star(a) (^> 2/) = n ( \_i + ^ - 1 



for some integer tuple a describing the star-graph. It is likely that this sort of construc- 
tion could be generalised (presumably in terms of treewidth), extending the Theorem to 
cope with \m\ > 2 (at the cost of much ink), but no such analysis has yet been made. 
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Proof of Theorem 7 

We already saw (c/. Proposition 5, Theorem 2, and Proposition 3) that for any projector 
m on Fg (range denoted R), for x restricted to R, 

F[m(X)=x] = EsgR. [ (-ir" • /3s ] 

= Eseij. [ (-1)"" • Eeec(ft) [ cos(20 • {n, - 2|c|)) ] ] , 

where Ug denotes the length of C(Ps) as usual. Conceptually breaking the code C{Ps) into 
the direct sum of two pieces across the boundary created by the mask m, we write the 
codeword c G C{Ps) as the sum of Pg ■ t and Pg • k, for t and k in R* and K* , respectively. 
Hence we find that another way to write the same distribution is 

F[m(X)=x] = Ekei^.Es,tgR* [(-ir"-cos(20-(7is-2|Ps-(t + k)|)) 



Therefore, for any k E K*, let us define 



F[m(X) =x I k 



E, 



s,teR* 



(-1)^-" • cos (20 • {ng - 2\Pg ■ (t + k)|) 



Observe that this expression may be rewritten as follows. 
F[ m(X) = X I k ] = E 



te_R* 



(-ir*-exp|i0.^(-l)^^-(*+'^) 



(The computation proceeds along similar lines to the one in the proof of Theorem 2, but 
in reverse.) And from here we see that this genuinely describes a probability distribution, 
i.e. is non-negative for all x. 

It then follows immediately that the distribution for ?tt-(X) is just the uniform combination 
of the distributions above, over k G K* . Moreover, when \m\ = 0(log(n)), the dimension 
of the range R* is sufficiently small that there are only polynomially many t to worry 
about. To sample from m(X.) one need therefore only choose k uniformly from K* and 
then explicitly compute the entire probability vector shown above for that value of k, to 
an appropriate level of accuracy. Provided we use a fresh vector k for each sample, the 
simulation will be exact, up to accuracy of the numerical precision of the real computation. 
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